PCFG Parsing for Restricted Classical Chinese Texts

نویسندگان

  • Liang Huang
  • Yinan Peng
  • Huan Wang
  • Zhenyu Wu
چکیده

The Probabilistic Context-Free Grammar (PCFG) model is widely used for parsing natural languages, including Modern Chinese. But for Classical Chinese, the computer processing is just commencing. Our previous study on the part-of-speech (POS) tagging of Classical Chinese is a pioneering work in this area. Now in this paper, we move on to the PCFG parsing of Classical Chinese texts. We continue to use the same tagset and corpus as our previous study, and apply the bigram-based forward-backward algorithm to obtain the context-dependent probabilities. Then for the PCFG model, we restrict the rewriting rules to be binary/unary rules, which will simplify our programming. A small-sized rule-set was developed that could account for the grammatical phenomena occurred in the corpus. The restriction of texts lies in the limitation on the amount of proper nouns and difficult characters. In our preliminary experiments, the parser gives a promising accuracy of 82.3%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Chinese Syntactic Parsing Based on Extended GLR Parsing Algorithm with PCFG*

This paper presents an extended GLR parsing algorithm with grammar PCFG* that is based on Tomita’s GLR parsing algorithm and extends it further. We also define a new grammar—PCFG* that is based on PCFG and assigns not only probability but also frequency associated with each rule. So our syntactic parsing system is implemented based on rule-based approach and statistics approach. Furthermore our...

متن کامل

Self-Training PCFG Grammars with Latent Annotations Across Languages

We investigate the effectiveness of selftraining PCFG grammars with latent annotations (PCFG-LA) for parsing languages with different amounts of labeled training data. Compared to Charniak’s lexicalized parser, the PCFG-LA parser was more effectively adapted to a language for which parsing has been less well developed (i.e., Chinese) and benefited more from selftraining. We show for the first t...

متن کامل

Corpus-oriented Acquisition of Chinese Grammar

The acquisition of grammar from a corpus is a challenging task in the preparation of a knowledge bank. In this paper, we discuss the extraction of Chinese grammar oriented to a restricted corpus. First, probabilistic context-free grammars (PCFG) are extracted automatically from the Penn Chinese Treebank and are regarded as the baseline rules. Then a corpusoriented grammar is developed by adding...

متن کامل

PCFG parsing with CRF tagging for head recognition

This paper presents our work for participation in the 2009 CIPS-Parseval shared task on Chinese syntactic tree parsing, for which we adopt a general PCFG parsing procedure with a conditional random fields (CRF) tagger for head constituent recognition. Our experiments show that an acceptable tagging result is obtained on the basis of a standard PCFG parsing output and a further evaluation on oth...

متن کامل

An Effective Framework for Chinese Syntactic Parsing

This paper presents an effective framework for Chinese syntactic parsing, which includes two parts. The first one is a parsing framework, which is based on an improved bottom-up chart parsing algorithm, and integrates the idea of the beam search strategy of N best algorithm and heuristic function of A* algorithm for pruning, then get multiple parsing trees. The second is a novel evaluation mode...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002